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ABSTRACT 

Using a grid of ^ 2 million elements = 0.005) adapted from COSMOS photomet¬ 
ric redshift (photo-z) searches, we investigate the general properties of template-based 
photo-z likelihood surfaces. We find these surfaces are hhed with numerous local min¬ 
ima and large degeneracies that generally confound rapid but “greedy” optimization 
schemes, even with additional stochastic sampling methods. In order to robustly and 
efficiently explore these surfaces, we develop BAD-Z [Brisk Annealing-Driven Redshifts 
(Z)], which combines ensemble Markov Chain Monte Carlo (MCMC) sampling with 
simulated annealing to sample arbitrarily large, pre-generated grids in approximately 
constant time. Using a mock catalog of 384,662 objects, we show BAD-Z samples ^ 40 
times more efficiently compared to a brute-force counterpart while maintaining similar 
levels of accuracy. Our results represent hrst steps toward designing template-htting 
photo-z approaches limited mainly by memory constraints rather than computation 
time. 

Key words: methods: statistical - techniques: photometric - galaxies: distances and 
redshifts 


1 INTRODUCTION 

Future large-scale surveys such as Euclid (Laureijs et al. 
2011), the Wide-Field Infrared Space Telescope [WFIRST', 
Green et al. 2012), and the Large Synoptic Survey Telescope 
(LSST; Ivezic et al. 2008) that seek to constrain dark energy 
equation-of-state using weak gravitational leasing (Albrecht 
et al. 2006; Bordoloi et al. 2012) will require the derivation 
of redshifts (z) to an enormous number of objects (> 10^). 
While spectroscopic redshifts (spec-z’s) often are extremely 
precise, their cost- and time-intensive requirements will ne¬ 
cessitate the use of “photometric redshifts” (photo-z’s) de¬ 
rived from htting spectral energy distributions (SEDs) taken 
from a combination of broad- and/or narrow-band photom¬ 
etry in order to measure redshifts to majority of observed 
objects in a feasible amount of time. 

Two main approaches are currently used to derive 
photo-z’s: 

(i) Template fitting, which attempts to determine the 
* E-mail: joshua.speagle@ipmu.jp 


set of forward mappings from a collection of model param¬ 
eters and templates to observed color space. 

(ii) Machine learning, which attempts to directly de¬ 
termine the best inverse mapping from observed color space 
to redshift via a training set of multi-band photometry and 
their corresponding spec-z’s. 

Template fitting-based photo-z codes in use today suffer 
from several modeling and computational deficiencies. Due 
to an insufficient understanding of the relevant parameter 
space, most codes rely on fitting pre-generated “grids” of 
model galaxy photometry to probe corresponding regions of 
interest. This crude, “brute-force” approach not only results 
in inefficient sampling, but also requires trade-offs in param¬ 
eter resolution in order to remain computationally viable. 
As a result, it is generally both too slow and too inaccurate 
to meet the stringent requirements of these future dark en¬ 
ergy surveys, even with very sophisticated implementations 
(Brammer et al. 2008; Ilbert et al. 2009). 

Due in part to these issues, many researchers today have 
turned to machine learning as a way to meet these require¬ 
ments. While current advances in machine learning-based 
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photo-z’s show much promise and perform well with good 
spectroscopic training sets (Carrasco Kind & Brunner 2013, 
2014a; Sanchez et ah 2014; Hoyle 2015; Elliott et ah 2015), 
the current widespread lack of spectroscopic coverage in spe¬ 
cific but relevant regions of color space (Masters et ah 2015, 
submitted) and calibration issues (Cunha et ah 2014; New¬ 
man et ah 2015) indicate that template-fitting methods will 
likely still play a major part in determining good photo-z 
estimates for these future surveys, especially at higher red- 
shifts where spectroscopic coverage is sparser and more sys¬ 
tematically biased. 

We attempt to address some of the computational defi¬ 
ciencies involved in template fitting-based photo-z searches. 
Our main focus is the exploration of the general likelihood 
surface defined by pre-generated template grids and whether 
minimization techniques and clever sampling - both tai¬ 
lored to the general properties of the surface - can subse¬ 
quently accelerate photo-z calculation and form the basis of 
a template-based approach limited by memory availability 
rather than computation time. 

This paper is organized as follows. In §2, we give a brief 
overview of how standard template fitting codes generate 
model photometry and discuss several approaches for ex¬ 
ploring parameter space. In §3, we describe the creation of 
our mock photometric catalog from data in the Cosmolog¬ 
ical Origins Survey (COSMOS; Scoville et al. 2007) field 
that we use for testing, and outline the construction our un¬ 
derlying photo-z model grid. In §4, we explore the overall 
shape of the multidimensional likelihood surface for individ¬ 
ual objects. In §5, we use this knowledge to explore the con¬ 
ditional use greedy minimization algorithms, supplemented 
with stochastic sampling techniques, to quickly locate mul¬ 
tidimensional probability modes. In §6, we present Brisk 
Annealing Driven Redshifts (Z) (BAD-Z), a Markov Chain 
Monte Carlo (MCMC)-based approach that combines en¬ 
semble MCMC sampling with simulated annealing. In §7, 
we compare BAD-Z ’s performance relative to a corresponding 
brute-force counterpart across our full mock catalog. Finally, 
in §8 we discuss possible future extensions of this work. 

We standardize to the AB magnitude system (Oke & 
Gunn 1983) throughout the paper. 


ally (Ilbert et al. 2009) or in some linear combination scheme 
(Blanton & Roweis 2007) to create an underlying galaxy 
template set. To incorporate the impact of specific emis¬ 
sion lines, these galaxy templates are often modified with a 
set of emission line templates co-added according to a set of 
scaling relations taken from the literature (Ilbert et al. 2009; 
Salmon et al. 2015). Rest-frame galaxy templates are created 
by superimposing an additional uniform screen of galactic 
dust taken from a basis set of normalized dust attenuation 
curves (see, e.g., Bolzonella et al. 2000; Chariot & Fall 2000) 
with a given amount of extinction, usually parameterized by 
E{B-V). 

These rest-frame galaxy templates are then redshifted 
by (1+^) and modified by extinction from the intergalac- 
tic medium (IGM; Madau 1995) to form the final observed- 
frame galaxy template. The corresponding set of model pho¬ 
tometry is then generated by convolving the model galaxy 
flux density with the transmission of a particular filter (in¬ 
cluding atmospheric effects) normalized to a source at con¬ 
stant flux density. An illustration of this forward-mapping 
process is shown in Figure 1. 

Generating a new set of model photometry is compu¬ 
tationally expensive, involving multiple addition, multipli¬ 
cation, and power/exponential operations on several large 
(> 10^-10^ element) arrays. Furthermore, although gener¬ 
ating a new model template is around an order of magnitude 
or more computationally taxing than the corresponding fil¬ 
ter convolution, the latter process on its own is still too much 
of a computational burden to be called frequently when com¬ 
puting photo-z’s for individual objects. 

As a result, often the only computationally feasible ap¬ 
proach is to generate a large grid of model photometry for a 
discrete set of parameters before - rather than during - the 
actual SED fitting process. Although other alternatives exist 
(Graff et al. 2012; Akeret et al. 2015), we focus on optimiz¬ 
ing P{z) in the approximately instantaneous likelihood limit 
(i.e., on a pre-generated grid of model photometry) rather 
than in the expensive likelihood limit (where the compu¬ 
tational overhead involved with choosing samples is much 
smaller than time necessary to compute the likelihood) due 
to our emphasis on computational speed. 


2 FROM OBSERVED SED TO PHOTO-Z 

Deriving photo-z’s from observed photometry is composed 
of three main parts: 

(i) Generate model photometry from a list of input pa¬ 
rameters iTiodel(^)- 

(ii) Determine the goodness-of-fit (GOF) between the 
model photometry and the observed data Fobs- 

(hi) Use this information to decide what model parame¬ 
ters to sample. 

As understanding each of these components is crucial to¬ 
wards developing more efficient implementations, we discuss 
each of these in turn. 


2.1 Generating Model Photometry 


2.2 Determining the GOF 


In order to derive accurate multidimensional likelihoods, we 
must choose a suitable GOF metric. Although there are 
many possible choices, most routines use the simple met¬ 
ric in order to incorporate uncertainties on the photometry 
and/or model in a straightforward manner, where 


X^{x,s) = 


EohSji -SFmodeljZ 


( 1 ) 


where s is an associated model scale-factor, = crobs,z + 
cr^odei,i is the total variance, and the sum over i is taken 
over all observed bands. For a given x we can marginalize 
over s to minimize x^(x), giving us 


s = 


E 


Fobs, i Frxiodel, i 



'^modelji 


( 2 ) 


To generate model photometry, most codes begin with a set This is a simple one-step process that can be calculated prior 

of “basis” galaxy templates which are used either individu- to computing the actual value. 
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Figure 1. A schematic of how model photometry is generated for a given set of parameters and a collection of galaxy, emission line, and 
reddening templates. The initial galaxy template (black) is first modified by adding on emission lines (purple) before being reddened by 
a uniform galactic dust screen (red). The template is then redshifted (orange) and reddening from the IGM is applied (green) before 
being convolved with a given filter set (blue) to compute the final model photometry (blue circles). 


While is valid for normally distributed data at any 
level of signal-to-noise (S/N), it does not incorporate any 
method of treating upper limits, which are relatively com¬ 
mon in astronomical data. As a result, many codes (incor¬ 
rectly) implement ad-hoc procedures in an attempt to in¬ 
clude this more limited set of information. This leads to 
small but non-negligible biases in fitting routines for low S/N 
data involving upper limits. 

In this case, a modified metric can be derived to 
properly incorporate upper limits (Sawicki 2012). For a given 
set of observations divided into a subset I of observed values 
Fobs (included in the sum over i) and a subset J of upper 
limits Fim (included in the sum over j), 

xhodix, s) = s) + xlp,j(x, s), (3) 

where X/(f -s) is the usual statistic, and 


xlp,j(x,s) = -2^1n 


erfc I sFrxiodeljj' 

\/2(7j 


(4) 


is the corresponding term for upper limits, where erfc is the 
complementary error function. In this case, there no longer 
exists a simple analytic expression for s given x, and the 
above expression must instead be minimized using numeri¬ 
cal methods. As a result, it is much more useful to under¬ 
take SED fitting in linear flux space, which can accommo¬ 


date negative fluxes present in low-S/N data, rather than in 
magnitude space, which cannot.^ 


2.3 SED Fitting 

As outlined in §2.1, template fitting methods generate new 
model photometry by performing a series of operations on 
a set of templates before convolving the final product with 
the relevant filter set. As this process is expensive, most 
current approaches choose to pre-generate a large grid of 
model templates containing ~ 10®“^ individual sets of pho¬ 
tometry. Due to the enormous reduction in computing time 
that can be achieved on a pre-computed grid - calling 
given Fmodei(^) is several orders of magnitude faster than 
generating a new model from scratch - pre-generating pho¬ 
tometry from a large set of parameter combinations often 
saves orders of magnitude of computation time compared to 
repeatedly computing them in real time, especially given the 
large number of objects usually involved. 

In order to optimize searches on such grids, we explore 
three basic classes of methods. The “brute-force” approach 
(i.e., fitting the entire grid) is discussed in §2.3.1, while more 


^ “Luptitude” space (Lupton et al. 1999) is also a valid alterna¬ 
tive. 
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adaptive Markov Chain Monte Carlo (MCMC) and nested 
sampling (NS) methods are discussed in §2.3.2 and §2.3.3, 
respectively. We pay particular attention to their specific 
benefits and drawbacks as well the associated amount of 
computational overhead. 

2.3.1 Brute-Force Techniques 

The approach taken by most template-fitting photo-z codes 
today (e.g., Le PHARE; Arnouts et al. 1999; Ilbert et al. 2006) 
is to fit the entire pre-generated grid of points to each ob¬ 
ject to build up a model of the full A’-dimensional likelihood 
P{x\Fohs) at a predetermined resolution. Afterwards, mag- 
inalized probability distribution functions (PDFs) can be 
created through 

P{xi\Fohs) OC j P{x\Fohs)dx.J K'^P{x\Fohs), (5) 

XJ 

where xi are the subset of parameters of interest and xj are 
the subset of parameters to be marginalized over. 

While generally effective (Hildebrandt et al. 2010; 
Dahlen et al. 2013), brute-force approaches are subject to 
two major issues: 

(i) Inefficient at probing region(s) of interest. Brute-force 
methods tend to spend the majority of time for any given 
object (> 99%) sampling regions of extremely low probabil¬ 
ity. 

(ii) Scale proportional to dimensionality of problem. The 
corresponding size of the grid increases multiplicatively with 
the number of dimensions to be probed and the desired gran¬ 
ularity of each dimension. 

These caveats aside, however, implementations of brute- 
force approaches can often run relatively quickly compared 
to other methods that might nominally sample more effi¬ 
ciently due to the straightforward setup and lack of any real 
computational overhead, which can take full advantage of 
parallel processing as well as vectorization of repeat opera¬ 
tions. 


2.3.2 Monte Carlo Markov Chain-based Techniques 

Unlike grid-based approaches, MCMC-based algorithms 
(see, e.g., SatMC; Johnson et al. 2013) sample at a rate pro¬ 
portional to the PDF itself and weight every accepted trial 
point evenly (although see Bernton et al. 2015). A stan¬ 
dard search heuristic employed by most MCMC codes is 
the Metropolis-Bastings algorithm (Metropolis et al. 1953; 
Hastings 1970): 

(i) Draw a set of trial parameters Xn from the neighbor¬ 
hood function^ q{x\xn-i)- 

(ii) Accept the new trial and move to location Xn 

with probability min (l, A ■ Otherwise, 

^ V P(xri-i\F) q{xn-i\xn) J ’ 

remain at Xn-i- 

(iii) Repeat from step (i) until a stopping criterion is 
reached. 

^ Also often referred to as a “proposal distribution”. 


This procedure is used to guide several individual “chains” 
of related draws as they converge (“burn in”) to and eventu¬ 
ally begin sampling from the region of interest. Although the 
exact neighborhood function chosen is ultimately arbitrary, 
most often an Wdimensional multivariate-normal distribu¬ 
tion is used, where p = Xn-i is the mean vector 

(adjusted at each step) and X) = <7^1 is the covariance ma¬ 
trix (I being the identity matrix), also often adjusted over 
the course of a typical run. 

Because MCMC-based algorithms sample at a rate ap¬ 
proximately proportional to the PDF in a given region of in¬ 
terest, they are able to explore large, iV-dimensional spaces 
at finer resolution and with far fewer function calls than 
grid-based approaches while scaling relatively slowly with 
the dimensionality of the problem (up to moderate dimen¬ 
sionality; Handley et al. 2015). However, MCMC-based ap¬ 
proaches are still subject to several issues: 

(i) Dependent on the form of the neighborhood function. 
The performance of MCMC-based algorithms is sensitive to 
the precise construction and implementation of the neigh¬ 
borhood function. 

(ii) Only samples locally. MCMC approaches cannot reli¬ 
ably recover multimodal distributions where the modes are 
separated by distances much larger than the neighborhood 
function. 

(iii) Fails to generate fully independent draws. Due to 
their random-walk nature, individual chains should be 
“thinned” to ensure fully independent random samples. 

(iv) Ineffective during burn-in. Trials taken during the 
burn-in phase cannot be used to derive the PDF because 
they are sensitive to the chains’ initial positions. 

An MCMC-based approach running on a large, pre¬ 
generated grid represents a more efficient way of develop¬ 
ing a model of the likelihood by exploring only the por¬ 
tion of the grid that contains the majority of the probability 
(see, e.g., SpeedyMC; Acquaviva et al. 2012). As MCMC algo¬ 
rithms only involve a small amount of computational over¬ 
head^ for each likelihood evaluation, large efficiency gains 
can be achieved while retaining similar levels of accuracy to 
brute force approaches. In addition, since MCMC methods 
generally tend to sample in approximately constant time at 
fixed dimensionality, arbitrarily large grids can be used with¬ 
out substantially increasing the computation time, provided 
they can be loaded into memory. 

2.3.3 (Importance) Nested Sampling-based Techniques 

Importance Nested Sampling (Feroz & Hobson 2008; Feroz 
et al. 2009, 2013) provides an interesting approach to deal 
with the degeneracies observed in photo-z parameter space, 
as the method is designed to sample multimodal likelihoods. 
During INS, iViive “live” points are drawn from the prior 
P{x). At each subsequent iteration z, the point with the 
lowest Ci is removed from the set and replaced with another 
drawn from the prior under the constraint > Ci. In or¬ 
der to efficiently draw unbiased samples from the likelihood- 

^ Assuming the neighborhood function remains fixed. If it is in¬ 
stead allowed to adapt during the burn-in phase, the amount of 
computational overhead becomes somewhat larger. 
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constrained prior, at each iteration i the full set of A^uve 
points is decomposed to lie within a set of (possibly over¬ 
lapping) ellipsoids that can be evolved separately, allow¬ 
ing the algorithm to probe widely separated minima. As 
this sampling process is repeated, the live particles move 
through “nested” shells of constrained likelihood as the prior 
volume contained in the union of ellipsoids is steadily re¬ 
duced, finally terminating after reaching a specific tolerance 
threshold. An estimate of the posterior can then be obtained 
from the entire set of trials via using a corresponding set of 
weights (Cameron & Pettitt 2013), as outlined in Feroz et al. 
(2013). 

While INS enables the determination of more accu¬ 
rate PDFs by capturing degeneracies that might otherwise 
be missed by MCMC-based approaches, it is substantially 
slower due to the large computational overhead involved 
with computing the ellipsoidal decompositions (large) and 
the associated weights (moderate) for a pre-defined number 
of iterations. As a result, we limit our investigation in §6 to 
MCMC-based algorithms, and defer analysis of INS-based 
approaches to possible future work. 


3 GENERATING A REALISTIC MOCK 
CATALOG 

To avoid possible issues caused by template mismatch (es¬ 
pecially when compared to the full range of observed galaxy 
SEDs, including emission line variation) and other possible 
systematic effects, we opt to explore the parameter space 
spanned by a mock catalog of galaxies using the same set of 
templates we hope to later test. 

We begin the construction of our mock catalog using 
the high-quality photometry (~ 30 bands) available in the 
COSMOS field (Capak et al. 2007). Photo-z’s are derived 
with the brute-force template-fitting code Le PHARE using a 
grid with the following specifications: 

(i) Nz — 601: Spans the redshift range z = 0-6 in steps 
of Az = 0.01. 

(ii) At, gal = 31: Includes 8 elliptical templates and 11 
spiral templates (constructed by linearly interpolating be¬ 
tween the Polletta et al. (2007) templates), supplemented 
with 12 starburst (SB) templates constructed from Bruzual 
& Chariot (2003) SPS models assuming exponentially de¬ 
clining SFHs with ages ranging from 3 to 0.03 Gyr. 

(iii) Aemiine = 3: Only a single template including Lya, 
0[ii], H/3, 0[iii], and Ha is used. This is added to each tem¬ 
plate prior to applying reddening effects according to {0.5, 
1.0, 2.0} times the scaling relations outlined in Ilbert et al. 
(2009). 

(iv) At, dust = 5: The dust curves used include obser¬ 
vations from the SMC (Prevot et al. 1984), LMC (Fitz¬ 
patrick 1986), MW (Seaton 1979; Allen 1976), and SB galax¬ 
ies (Calzetti et al. 2000). 

(v) Ne(b-v) = 9: Each template is allowed E{B-V) val¬ 
ues of {0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5}. 

After correcting for zero-point offsets using the spectroscopic 
sample from zCOSMOS (Lilly et al. 2007, 2009), a subset of 
the grid is then fit to each object and marginalized over to 
derive the associated redshift PDE. See Ilbert et al. (2009) 
for additional information. 


Using the median P{z) value (regenerated to higher 
Az resolution than the underlying grid), best-fitting tem¬ 
plate, reddening law, E{B-V), and scaling factor derived 
from Le PHARE for each galaxy, we create a collection of cor¬ 
responding model galaxy templates that mimic the obser¬ 
vational data.^ We convolve these templates with a set of 
12 filters designed to mimic the wavelength ranges probed 
by the future Cosmological Advanced Survey Telescope for 
Optical and UV Research {CASTOR; EVcUcgc) and Euclid 
(VwJwHw) missions supplemented with ground-based pho¬ 
tometry {ugrizY). These provide a wide wavelength range 
to detect spectral features, including 3 overlapping bands 
(m, g, and Y). 

Erom an initial sample of ~ 2 million galaxies, we imple¬ 
ment an Hw = 24.5 mag cut to mimic the 5-cr imaging depth 
of the wide-held Euclid survey. This leaves us with ~ 20% 
of the original sample, or 384,662 galaxies spanning a red¬ 
shift range from z = 0 to ~ 3.2, including a small number of 
objects up to ~ 6 (Eigure 2). 

These photometric huxes are then jittered according to 
the expected background noise levels based on the antic¬ 
ipated depth of the imaging in each band^ to create the 
hnal mock catalog. The distribution of photometry (in mag¬ 
nitudes) as a function of redshift in each of the hlters is 
shown in Eigure 3 along with their respective 5-cr imaging 
depths. To avoid complications arising from the inclusion of 
upper limits (see §2.2), we leave the model photometry in 
hux space. 

In addition, we also compute a baseline GOE value for 
each object (Xbase) by determining the associated values 
between the original (pre-jitter) model photometry and fi¬ 
nal (post-jitter) mock photometry. This allows us to see the 
extent to which our errors altered the original GOE while 
also providing a check on the overall quality of the best fits 
determined by our code(s). In particular, given optimal per¬ 
formance and an ideal model set, the derived Xobj foi* 
object should always satisfy the condition xlhi ^ Xbase- 

In other words, an ideal photo-z code should always find 
either (1) the “correct” solution or (2) an “incorrect” one that 
is a better fit to the data. This is not always true in prac¬ 
tice, especially in cases (such as ours) where the associated 
redshift has been regenerated to a higher resolution than 
the corresponding Az spacing of the grid. While being able 
to find the “best” match to the data is a desirable feature 
of an effective fitting routine, it is not strictly necessary - 
as long as the general region around the best fit is probed 
sufficiently well (i.e., the relative shape of the PDE can be 
recovered even when sparsely sampled), a given algorithm 
should still be able to derive accurate P(z)’s and associated 
estimates. 


Due to the coarse nature of the emission line grid, we do not 
include emission line contributions to simplify the nature of our 
tests. This does not impact our conclusions, although we explore 
ways of incorporating this additional complexity in future work 
(Speagle et al. 2015, in preparation). 

^ This calculation does not include error from shot noise from 
galaxy photons, which is expected to be on the order of ~ 2% at 
the 3cr detection limit. 
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Figure 2. The redshift distribution of our mock catalog of 384,662 COSMOS galaxies. Although the distribution extends out to 2 ; = 6, 
the majority of galaxies are located at 2 ; < 3.2. 


4 MAPPING THE PHOTO-Z LIKELIHOOD 

SURFACE 

Using a finely spaced grid of ~ 2 million sets of model pho¬ 
tometry - identical to the one used by Le PHARE except with 
an improved /\z resolution of 0.005 - we calculate the 
value at every trial point for a series of individual objects 
from our mock catalog. We then derive the corresponding lo¬ 
cations and likelihoods for competing minima as a function 
of the Euclidean distance (normalized to the grid spacing 
in each dimension) from the global minimum. To determine 
the relative “size” of each minimum, we force each point on 
the grid follow the surrounding (discrete) gradient until it 
locates the closest corresponding minimum and record the 
hnal number of trial points occupying each local minimum. 

To understand how this underlying structure corre¬ 
sponds to the output P{z) distribution, we marginalize over 
ah trials to derive P{z). Together, these pieces of informa¬ 
tion not only inform us about the approximate general dis¬ 
tribution, size, depth, and behavior of minima within the 
4-D parameter space, but also how the corresponding struc¬ 
ture affects the hnal marginalized distribution of interest. 
The results for three representative objects are plotted in 
Figure 4. 

We hnd several main features of interest: 

(i) The main global minimum is surrounded by numerous 
competing minima that occupy sizeable regions of parame¬ 
ter space (middle panels), indicating that the space directly 
in the area of the global minimum and other notable degen¬ 
eracies (e.g. confusion over the 1216 A and 4000 A breaks) 
are quite “bumpy”. 


(ii) The redshift-reddening degeneracy (i.e., red objects 
may either be dusty or at high(er) redshift) appears to oc¬ 
cupy a signihcant region of parameter space (bottom pan¬ 
els), although it’s overall contribution to the marginalized 
likelihood is (usually) small (top panels). 

(hi) A series of htting artifacts (i.e., edge effects) due to 
highly mismatched templates lead to small “ridges” within 
the corresponding region of parameter space (bottom pan¬ 
els) that have negligible probability but occupy a small re¬ 
gion of parameter space that needs to be avoided. 

Most crucially, almost none of this substructure appears in 
the hnal redshift PDF (top panels). As a result, not only 
does the full 4-D space appear to contain hundreds to thou¬ 
sands of local minima with degenerate regions a signihcant 
distance away from the global minimum occupying large re¬ 
gions of parameter space, but most of this structure remains 
hidden when investigating marginalized 1-D P{z) distribu¬ 
tions only. 

Further investigations in Speagle et al. (2015, in prepa¬ 
ration) suggest that much of this substructure is due to the 
combination of 1-D projections of non-linear effects in the 
creation of the grid. Galaxy SEDs occupy a very non-linear 
iV-dimensional manifold in color space, with complex and 
correlated changes in the overall shape. However, when used 
in most grids, they are often assigned arbitrary numbers 
based on some simple diagnostics (e.g., FUV hux), equiva¬ 
lent to a crude 1-D projection. This projection loses valu¬ 
able information concerning the higher-dimensional mani¬ 
fold, and when combined with the equivalently crude projec¬ 
tion of dust attenuation curves (where the ordering is often 
equally arbitrary) and redshift evolution, creates a number 
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Figure 3. Photometry (in magnitudes) in each band for all objects in our mock COSMOS catalog as a function of redshift, with 50%, 
70%, 90%, 95%, and 99% contours overplotted in red. 5-a upper limits are denoted by horizontal dashed red lines. Our mock catalog 
spans a wide range of magnitudes and imaging depths in all available bands. Our H = 24.5 mag cut (to mimic the imaging depth of the 
Euclid wide-held survey) can be seen in the bottom-right panel. 


of (wide) local minima that might otherwise not exist. While 
this indicates that a more suitable choice of grid might elim¬ 
inate some of the bumpiness and/or edge effects, the broad 
features observed in Figure 4 still remain valid in general. 


5 OPTIMIZATION IN 

REDSHIFT-REDDENING SPACE 

Although the general likelihood surface is relatively rough, 
smaller dimensional slices might be smoother and thus more 
amenable to “greedy” (i.e., strict gradient-descent) optimiza¬ 
tion methods (e.g., Levenberg-Marquardt; Levenberg 1944; 
Marquardt 1963) that could be used to accelerate photo-z 
calculations. Since many of the most effective greedy algo¬ 
rithms used in higher-dimensional minimization problems 
only function properly for a (quasi-)continuous set of input 
parameters, to avoid the difficulty in interpolating between 
discrete templates we explore their conditional use on re¬ 
duced 2-D z-E{B-V) subspaces for a given galaxy template 
and dust attenuation curve. 

Since we are minimizing quantities that are both non¬ 
negative and bounded, we focus on the use of constrained 


minimization algorithms. Although the results presented in 
this section use Constrained Optimization by Linear Ap¬ 
proximation (COBYLA; Powell 1994), we test that they re¬ 
main largely unchanged if other (un) constrained minimiza¬ 
tion algorithms (and associated penalty functions) are used. 

In order to investigate the possibility of multiple, 
widely-separated minima within the subspace, we supple¬ 
ment our chosen greedy minimizer with a simple version of 
“basin-hopping” (Wales & Doye 1997), an iterative stochastic 
metaheuristic (i.e., a heuristic that uses pre-existing search 
heuristic) designed to probe spaces with a few deep but 
widely separated degeneracies, ft acts on a given minimiza¬ 
tion routine as follows: 

(i) Given the position Xmin,i of the last accepted mini¬ 
mum, randomly jump to a new coordinate Xstart,j based on 
a given neighborhood function g(x|xmin,0-^ 

(ii) From Xstart,j, use a chosen minimization algorithm to 
find the nearest minimum Xmin,j. 


® As with MCMC approaches, this is often chosen to be an N- 
dimensional multivariate Gaussian. 
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Distance from Global Minimum Distance from Global Minimum Distance from Global Minimum 


Figure 4. 4-D maps of the photo-z likelihood surface for three representative objects from our mock catalog shown from left to right. 
Top: Marginalized P{z) distributions for each object. Individual grid points are plotted as black dots, with a cubic spline fit shown 
with a solid red line and the location of the peak marked by a dashed red line. In all cases P{z) distribution is narrow and well-defined. 
Middle: The values of individual minima plotted as a function of the Euclidean distance from the global minimum. Each plot shows 
all minima found within Az ~ 0.5 {Az = 0.1 is ~20 grid units). The large number of minima indicates the region around the global 
minimum value is quite bumpy. However, since most minima have significantly higher values than the best-fit, they tend to have an 
extremely small impact on the marginalized P{z)^s (top panels). Deriving accurate P(z)^s thus relies heavily on being able to locate and 
sample the small region surrounding the global minimum. Bottom: The probability (normalized to 1) that a random trial point on the 
grid will reach a specific minimum, plotted as a function of Euclidean distance from the global minimum. While the global minimum 
is a far better fit than most competing minima (middle panels), the latter occupy large areas of parameter space, making it extremely 
difficult for local and/or gradient-based minimization algorithms to reach relevant P{z) modes. The impact of degeneracies at widely 
separated distances from the global minimum (clusters of large minima at dgrid ^ 100) as well as edge effects (“furrows” at dgria ^ 300) 
can also be seen, both of which further impede this process. 


(iii) Replace Xmin,i^z+i with the new coordinates Xminj 
according to the Metropolis-Hastings criterion. 

(iv) Repeat from step (i) until a stopping criterion is 
reached. 

(v) Select the global minimum from the collection of all 
accepted minima {xmin,z}. 

This procedure is conceptually similar to an MCMC algo¬ 
rithm, where instead of calling a likelihood function at each 
iteration we run a specihc minimization routine starting 
from that location. To avoid any procedural hne-tuning, we 
let q{x) be uniform over the entire z and E(B-V) interval 
and only use a hxed number of A^hops runs. 

The output for one such object given the “correct” (i.e., 
intrinsic) galaxy and dust templates is shown in Figure 5. 
We hnd the total number of trials required to hnd the global 
minimum in this subspace is relatively small, with iVtriais ^ 
1500 for A^hops = 10. This corresponds to ~ 30% of the size 
of the corresponding coarse z-E{B -V) grid, albeit sampled 
at arbitrary (rather than hxed) resolution. As A^triais scales 
approximately linearly with A^hops, the true efficiency for 
an algorithm tailored to the specihcs of this space would 


likely be somewhat higher. As a result, we hnd that running 
a constrained stochastic minimization procedure for every 
combination of discrete parameters by itself is at least a 
factor of three more efficient at locating the global minimum 
(with improved accuracy) than a standard grid search. 

Unfortunately, we hnd that many of the unwanted fea¬ 
tures present in our full 4-D map (Figure 4) are also present 
in this reduced subspace. ^ As a result, we conclude that 
the number of basin-hopping trials necessary to ensure that 
greedy algorithms will reliably converge to the global min¬ 
imum is likely quite large, negating most of the potential 
gains it may have provided. 


^ These features are also not limited to a specific galaxy-dust 
template combination - for a similar procedure computed us¬ 
ing a slightly mismatched galaxy template-dust curve combina¬ 
tion, we observe the same general features. For extremely mis¬ 
matched templates (e.g., trying to fit an elliptical to a slightly 
reddened spiral), however, we instead find the algorithm repeat¬ 
edly terminates at boundary locations (i.e., grid edges) such as 
lz,EiB-V)] = l0,0]. 
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Figure 5. COBYLA minimization runs using stochastic basin-hopping (A^hops = 10) ^ 2-D z-E{B - V) slice for a sample object at 

[z^ E{B-V)] = [2.2221,0.2] given the “correct” underlying galaxy-dust template combination. Each trial point is plotted and colored 
according to the corresponding scaled log-x^ value while the best value located during each run is marked using red squares. As Figure 4, 
we find the reduced subspace retains a significant number of local minima surrounding the global best-fit value (cyan star), a significant 
confounding factor for standalone uses of “greedy” minimization algorithms on pre-generated grids of model photometry. 


6 DESIGNING AN EFFICIENT AND ROBUST 

PHOTO-Z SEARCH ALGORITHM 

Based on these results and the discussion in §2.3, we now 
explore the use of MCMC sampling to explore large pre¬ 
generated grids quickly and efficiently. We find that the en¬ 
semble MCMC sampling implementation from Goodman & 
Weare (2010) and implemented in emcee® (Foreman-Mackey 
et al. 2013) addresses most of the issues regarding MCMC 
approaches outlined in §2.3.2 while avoiding many of the 
drawbacks. As a result, we opt to use it as the underlying 
foundation of our algorithm and describe the basic imple¬ 
mentation below. 

In brief, rather than spawning M chains that sample 
from a neighborhood function, we instead spawn a (much 
larger) ensemble of A ^ M “walkers” which each sample 
from on the current locations of the complementary ensem¬ 
ble (i.e., the other N —1 walkers) rather than any individual 
walker’s local neighborhood. By eliminating the neighbor¬ 
hood function in this manner, we decrease the fine-tuning 
often required for MCMC approaches while also ensuring 
that every trial is nearly independent, simultaneously solv- 

® http://dan.iel.fm/emcee 


ing a number of issues raised in §2.3.2. For our purposes then, 
ensemble MCMC sampling gives all the benefits inherent in 
an MCMC-driven approach on a pre-computed model grid 
while eliminating the majority of the drawbacks. 

However, although the general implementation outlined 
above is powerful for sampling around the region of inter¬ 
est, it will still be relatively inefficient during the burn-in 
phase. To assist with this, we turn to simulated annealing^ a 
metaheuristic designed to assist searches such as these where 
the goal is to find the global minimum in a bumpy and 
often significantly multimodal space. The overall method 
involves imposing a global temperature on either the en¬ 
tire space (the standard implementation) or individual sam¬ 
plers/regions (otherwise known as “tempering”; see Johnson 
et al. 2013) that distorts the shape of the space such that 

P{x) , (6) 

where T(t) is the temperature as a function of time (i.e. 
iteration) and To is the “transition temperature” that we 
henceforth take to be 1. 

For T(t) > To, “bad” jumps have a higher relative prob¬ 
ability of being accepted, allowing an algorithm additional 
stochasticity while sampling. As a result, it is able to explore 
more of the search space in the hopes of finding the region 
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Figure 6. Individual trials as a function of position in terms of (left) and P = e~^ (right) for a 1-D projection of a toy 4-D function 
designed to broadly mimic the features of our photo-z space described in §4 and observed in Figure 4. A portion of a 1-D component 
of this function is shown in blue (left central inset). The dense sampling in regions surrounding the global minimum illustrates that the 
algorithm is sampling effectively and can robustly locate regions of high probability in significantly bumpy parameter spaces. 


around the global minimum. For T(t) < To, bad jumps in¬ 
stead become relatively less likely of being accepted, reduc¬ 
ing stochasticity and increasingly constraining the algorithm 
to only move in the direction of the gradient. Ideally, this 
forces our algorithm to find a more optimal solution assum¬ 
ing that it has reached the general region of interest. 

To test whether that this approach will improve burn- 
in performance in the region we are hoping to explore, we 
investigate the performance of a simulated annealing-driven 
MCMC-based algorithm on a sample function designed to 
mimic the overall properties of the photo-z parameter space 
we observed in §4. Using an ensemble of 250 walkers, a = 2, 
T{t = 0) = 2.5, and AT = 0.01 per ensemble run (i.e. after 
iterating over all walkers once), we verify that not only is 
the combined algorithm able to effectively locate and char¬ 
acterize the region of interest (Figure 6), but that the per¬ 
formance at both low and high T is important in locating 
the global minimum. 

Although simulated annealing is effective at locating the 
general area of the peak, since it distorts the likelihood over 
the course of the run (thus violating “detailed balance”), 
samples cannot be used to reconstruct the associated PDF. 
Our solution is to only include simulated annealing during 
the burn-in phase to help locate the global minimum. Af¬ 
terwards, we set T(t) to 1, re-spawn an ensemble of walk¬ 
ers in an A’-dimensional multivariate Gaussian distribution 
around the best-fit value located by the ensemble, and re¬ 
construct P{z) from a subset of the latter trials. 

As mentioned in §2.3.2, in almost all cases MCMC- 
based methods have a difficult time effectively character¬ 
izing multimodal spaces. In particular, the efficiency of en¬ 
semble MCMC sampling rapidly declines when applied to 
multi-modal surfaces, especially when the modes are sharply 
peaked (Foreman-Mackey et al. 2013). While respawning the 
ensemble of walkers around the best-fit set of parameters 
will by default miss truly multimodal P(z)’s, we have cho¬ 


sen this approach as a compromise to give a slightly biased 
view of the “true” P{z) rather than completely undersam¬ 
pling and/or mischaracterizing it. 

We term our photo-z algorithm Brisk Annealing- 
Driven Redshifts (Z) (BAD-Z), which is summarized below: 

(i) Initialize W walkers on an arbitrary A-dimensional 
input grid drawn from a uniform distribution. 

(ii) Start a simulated annealing-driven ensemble MCMC 
run with a given set of T(t = 0) and AT/A (ensemble run) 
values to find the region of the global minimum. 

(iii) Spawn a new distribution of W walkers in an N- 
dimensional multivariate Gaussian with some fractional 
spread crfrac in each dimension around the trial with the 
highest likelihood from the previous simulated annealing- 
driven run. 

(iv) Set T{t) = 1 and AT/A (ensemble run) = 0 and start 
a new run for an additional Amcmc number of ensemble 
runs. 

(v) After discarding an initial fraction /disc of trials from 
this second run, reconstruct the redshift PDF using the re¬ 
maining accepted trials. 

BAD-Z is written C and parallelized using OpenMP. 


7 TESTING PERFORMANCE AGAINST A 

GRID-BASED APPROACH 

In order to showcase the performance of BAD-Z relative to 
an appropriate baseline, we create a brute-force counterpart 
(also coded in C and parallelized using OpenMP) that func¬ 
tions exactly as described in §2.3.1, a process that involves 
a total of ~ 1.6 x 10® trials per object. We run both codes 
on the mock photometric catalog described in §3 using the 
same grid outlined in §4. 

While our brute-force code by dehnition hts the en- 
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Figure 7. Top left: The minimum log(reduced-x^ found among the full set of walkers after each iteration through the entire ensemble 
for an random object from our mock catalog. In this specific case, BAD-Z converges to the best-fit set of parameters Xbest after ~ 60% 
of the simulated-annealing driven portion of the run (0.6% the size of the corresponding grid), and relocates it relatively quickly after 
the walkers have re-spawned for the “vanilla” (T(t) = 1) portion. Top right: As the top left panel, but for the best-fit redshift ^best- 
Similar behavior can be seen, although small perturbations away from 2:best are now visible. Bottom left: The best-fit model (blue 
circles) versus observed (black circles with red error bars) photometry plotted at effective wavelength in each filter. For clarity, the model 
photometry has been slightly shifted. The observed agreement between the two sets of points indicates the overall quality of the final 
fit(s) as seen in the top left panel. Bottom right: The redshift PDF P{z) derived from the final 40% of all trials as a function of time 
(blue to red). As we are consistently probing the area directly around :rbest (see top left), BAD-Z is able to effectively reconstruct the 
underlying PDF using only 18,000 independent trials, or about 1.1% of the size of the corresponding grid. 


tire grid to each individual object, BAD-Z traverses this 
grid utilizing an ensemble with A^waikers = 150, a = 2, 
T{t = 0) = 3.0, AT/A(ensemble run) = 0.03, cruac = 0.15, 
Nmcmc = 200, and /disc = 0.4. Note that this is a deliber¬ 
ately conservative choice of parameters that leads to a total 
of 45,150 trials (2.7% of the full grid) per object (i.e., 301 
ensemble runs), of which only 18,000 (1.1% of the full grid) 
are used in the reconstruction of the final PDF. An example 
run for an individual object in our mock catalog is shown in 
Figure 7. 

Using the reconstructed P{z) for each individual ob¬ 
ject, we classify the best-fitting redshift as the median of 
the distribution Zmed = median(P(z)). The 2-D distribu¬ 
tions of input redshift Zin versus fitted redshift Zfn = ^med 
for both methods are shown in Figure 8. Although BAD- 


Z samples ~40 times less than its brute-force counterpart, 
it produces photo-z estimates with comparable accuracy to 
the brute-force approach and similar catastrophic outlier 
(l^fit — ^in|/(l + ^in) > 0.15) rates (0.7% and 1.0% for BAD-Z 
and brute-force, respectively).^ 

In addition, we find that BAD-Z is quite effective at find¬ 
ing good fits to the data, locating “optimal” (xit/Xbase ^ 1) 
fits in ~ 55% of cases and “reasonable” (xit/Xbase ^ 5) ones 
in ~ 90%. As the resulting redshifts are accurate in the ma¬ 
jority of fitted objects (Figure 8), this indicates that even 
in cases where BAD-Z fails to find the “best” fit to the data 
(for a variety of possible reasons), it still manages to probe 

^ We find similar results using redshift estimates derived from 
the mean and mode of each P{z), which are not shown. 
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Figure 8 . The distribution of input redshifts Zin versus fitted redshifts (70%, 95%, and 99% contours plotted) for our brute-force 
code (black) and our MCMC-based, simulated annealing-driven BAD-Z code (red) using a grid of ~ 2 million elements (Az = 0.005) at 
2 : < 3.2. The marginalized distribution of the fractional redshift error 77 = + z-^^) is shown in the upper left inset with the 

same color scheme, with the usual threshold for “catastrophic errors” (?7cat = 0.15) indicated with dashed red lines. The catastrophic 
outlier fractions for the two samples are 1.0% and 0.7%, respectively. In addition to being ~40 times more efficient than the brute-force 
approach, BAD-Z provides similar levels of overall accuracy while retaining the ability to accurately capture the underlying redshift PDF. 


the surrounding region to high enough accuracy that the 
marginalized P[z) distribution gives accurate predictions. 

Finally, we compare the similarity between the P(z)’s 
computed from BAD-Z to those computed with our brute- 
fore code to determine the general ability of our new code to 
recover accurate PDFs as well as the overall redshift distri¬ 
bution. The [16th, 50th, 84th] quartiles of the difference be¬ 
tween the median P{z) solutions computed with both codes 
is [-3%, -0.5%, 1.5%], indicating that while BAD-Z has a slight 
negative bias, the overall agreement between both codes is 
excellent. We do find however, that the distribution about 
the correct redshifts is slightly broader for BAD-Z than for our 
brute-force code, with a two-sample Kolmogorov-Smirnov 
(KS) test statistic oi p — 0.03 (2.2-cr). 

To confirm this, we investigate the distribution of er¬ 
rors for individual objects, finding that the PDFs computed 
by BAD-Z are indeed preferentially broadened relative to the 
brute-force code by ~ 50% (median). This is likely caused by 


the limited number of samples used to reconstruct the PDF 
(due to our high choice for /disc) and the discrete nature 
of the grid (which can lead to a “pile-up” of our walkers at 
specific grid points, reducing our effective resolution). Ulti¬ 
mately, we find both codes produce N[z) distributions that 
are consistent with being drawn from the same parent pop¬ 
ulation as the spec-z distribution and are consistent with 
being drawn from the same parent population relative to 
each other (KS p-values > 0.05 in all cases), indicating that 
although the PDFs provided by BAD-Z might be “rougher” 
than those derived from sampling the entire grid they still 
provide reliable measurements. 

In summary, by understanding the general topology 
of the relevant photo-z likelihood surface seen by a spe¬ 
cific grid of pre-computed model photometry, we are able 
to design an algorithm that is substantially more efficient 
than traditional brute-force approaches while retaining sim¬ 
ilar levels of accuracy. Our ensemble MCMC-based, simu- 
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lated annealing-driven BAD-Z algorithm performs extremely 
well on the bumpy, degenerate photo-z likelihood surface 
explored in this work, giving good fits in > 90% of cases 
with a comparable catastrophic outlier fraction (0.7% ver¬ 
sus 1.0%). BAD-Z thus illustrates the power inherent in the 
general methodology outlined in this section. 

Although these results are promising, we wish to em¬ 
phasize that BAD-Z is subject to the limitations inherent in 
all MCMC sampling approaches regarding the stochastic re¬ 
construction of PDFs as a function of sample size and the 
limitations of locating and converging to the target distri¬ 
bution during burn-in. While we have attempted to bypass 
these problems through the use of ensemble MCMC sam¬ 
pling (to allow for more exploration of the parameter space 
and better reconstruction of the PDF), simulated anneal¬ 
ing (to help ensure robust convergence to the global mini¬ 
mum), and the large width of our “respawned” distribution 
after burn-in (with a 1-cr width of 15% of the corresponding 
parameter space), our chosen method by construction will 
mis-characterize truly multi-modal distributions, especially 
those with a large number of widely-separated modes with 
similar amplitudes. 


8 CONCLUSION 

While photometric redshifts (photo-z’s) represent an inte¬ 
gral part of modern extragalactic science, outstanding issues 
that currently plague template fitting-based approaches are 
concerning in the face of looming future “big data”-oriented 
surveys. This work represents the first steps towards mov¬ 
ing template-fitting photo-z codes from a runtime-limited 
regime to a memory-limited one. Our main results are as 
follows: 

(i) Using a pre-generated grid of ~ 2 million elements 
{Az = 0.005), we create “maps” of the associated photo-z 
likelihood surfaces. For our chosen grid, we find that the 
surface is significantly “bumpy”, with a substantial number 
of minima occupying large areas compared to the region di¬ 
rectly surrounding the global best-fit value. 

(ii) We explore the use of “greedy” minimization algo¬ 
rithms on 2-D redshift-reddening slices to supplement photo- 
z searches. We find these 2-D slices remain significantly 
multi-modal such that locating the best-fit model proves 
to be difficult even with the help of metaheuristics such as 
basin-hopping. 

(iii) Building on these results, we design a specific al¬ 
gorithm Brisk Annealing-Driven Redshits (Z) (BAD-Z) to 
explore pre-generated grids of model photometry during 
photo-z searches through a combination of ensemble MCMC 
sampling and simulated annealing. 

(iv) Using a mock catalog of 384,662 COSMOS galaxies, 

we test the performance of BAD-Z over a wide wavelength 
(UYugrizY JH) and redshift (0 < < 3.2) range. Compared 

to a grid-based counterpart, we find BAD-Z is ~ 40 times more 

The code also runs relatively quickly, providing redshift PDFs 
to individual objects in ~ 1.5 s per core and to the entire mock 
catalog (parallelized with 32 threads) in a little over five hours. 
Significant gains could likely be achieved with optimization of the 
relevant code. 


computationally efficient, retains a similar level of accuracy, 
and performs robustly over the entire redshift range probed. 

These results are merely the first steps towards a rigor¬ 
ous attempt to improve photo-z’s and only part of a much 
larger, extended effort within the extragalactic astronomical 
community. For instance, almost all photo-z codes - includ¬ 
ing the one showcased in this work - utilize exclusively color 
information to derive P{z). This, however, ignores poten¬ 
tially important information such as clustering, morphology, 
angular size, and/or surface brightness, all of which might 
either improve accuracy or help distinguish the dominant 
mode(s) of a multimodal redshift PDF. In particular, in¬ 
corporating clustering information would be a useful next 
step towards exhausting all available information contained 
in photometric surveys (Menard et al. 2013; Newman et al. 
2015). 

In addition, this work has focused almost exclusively on 
template-fitting approaches to deriving photo-z’s, bypassing 
a wide range of machine learning techniques that are almost 
certainly critical in improving upon current photo-z method¬ 
ologies. In particular, (un)supervised machine learning ap¬ 
proaches such as Self-Organizing Maps (Kohonen 1982; Ko- 
honen 2001; Carrasco Kind & Brunner 2014a; Masters et al. 
2015, submitted) offer the opportunity to move beyond sim¬ 
ple inverse mapping approaches to instead incorporate prior 
knowledge about a given dataset in increasingly sophisti¬ 
cated ways (Dahlen et al. 2013; Carrasco Kind & Brunner 
2014b). 

Finally, while this work has focused on many of the 
more computationally-oriented avenues towards improving 
photometric redshifts, a major unresolved issue in current 
photo-z searches is the dual set of model uncertainties that 
arise due to the use of local galaxy templates (and emis¬ 
sion line scaling relations) to probe galaxies at much higher 
redshifts and the limited range of dust templates (and lack 
of priors; Repp et al. 2015) often used during the fitting 
process. Both of these avenues must be explored further in 
order to develop an improved set of templates and a better 
understanding of modeling uncertainties. Ultimately, there 
remain ample opportunities for further investigation in ad¬ 
vancing new techniques, creating superior template sets, and 
improving overall computational efficiency. 
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